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ABSTRACT 

This guide discusses six sets of features tc examine 
when purchasing a microcomputer-based statistics program: hardware 
requirements; data management; data processing; statistical 
procedures; printing; and documentation. While the current 
statistical packages have several negative features, they are cost 
saving and convenient for small to moderate data sets when compared 
to mainframe computers. It is important to evaluate a statistical 
analysis program in terms of its versatility regarding the features 
needed most. (BS) 
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KEY CHARACTERISTICS OF STATISTICAL ANALYSIS SOFTWARE 



There are si:c sets of features that should be examined when 
considering the purchase of a microcoiaputer -based statistics 
program* These include hardware requirements , data management , 
data processing, statistical procedures, printing, and 
documentation. Each of these features is examined in more detail 
in the following discussion. 



Data Management 

Two elementary features of statistical programs are the kind 
and amount of data they can handle. In regard to the Kind of 
data, the question is, can one enter integer, character 
(alphanumeric) and/or decimal data? Most prc^rams do not permit 
all three kinds of input* And^ regarding the amount of data a 
program can handle, it is important to ask, what is the total 
possible number of variables, cases, and data items (i.e., the 
product of variables and cases)? Limitations on the amount of 
data are due to a variety of factors. One is that a program can 
operate only on data located in the internal memory of a machine, 
that is, random access mewoty or RAM. Another factor is that a 
program can access data from only a single storage disk. Still 
another factor is the dimensions of the arrays that a program can 
handle. 



Programs vary considerably in their data management 
capabilities. Therefore^ in order to decide among programs, it 
is necessary to determine one's own requirements for type and 
amount of data. An article by Carpenter, Deloria^ and 
Morganstein (1984) describes these and other characteristics of 
24 different statistical programs. Much of the detailed 
information in this column regarding the capabilities of programs 
was obtained from their article. A word of caution, as with any 
article, there may be a problem of information being out of date, 
especially in as fast^changing an area as statistical software. 
Readers are advised tc contact individual producers or 
manufacturers to get the latest information on packages. Some 
programs, such as Statpro, also offer technical references, 
reviews, and names of people who are currently using the 
product. These are gjod sources of information, although they 
may be biased towarj a particular product. 



Actual data management starts with the entry of data into the 
program. The way that a program stores files internally 
influences both data entry and data transfer. If a program 
stores files in a unique way, it may be difficult to use data 
other than those entered directly in the program via the 
keyboard, and it will be difficult to share those data with 
another program (e*g. , graphics, word processing). This is 
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especially true if the storage format is not well documented 
since this inhibits one's ability to change the data format to 
make it compatible with other file structures. The issues 
surrounding docuiuentation will be discussed later in this coluwn. 



There are several ways that data can be entered into a 
program. One is directly from the keyboard using the program 
itself. Data can also be transferred into a proijram from other 
programs via the Data Interchange Format (DIF) used by VisiCalc, 
for example, by using the standard ASCII files created from 
keyboard entries by other programs, or from specially formulated 
files created by special editors or by mainframe programs. 
Deciding among prc^rams in regard to their mode of data entry and 
file structure will depend largely on whether files are likely to 
be transferred axoong programs. If files are typically shared, 
then compatibility is a must. If this is not the case, then ease 
of data entry is the main criteria for comparing programs. 

Two data entry procedures exist, case-by-case entry, where 
all variables for each case are entered at the same time, and 
variable-by-variable entry* where all cases for each variable are 
entered as a group. Better programs give you a choice between 
these two procedures. However, a single data structure should be 
used throughout the program. That is, the c^ses should be the 
rows and the variables the columns, or vice versa, consistently 
throughout a program. 



How a program handles errors in data entry or the entry of 
commands is an important f a" ^or in its ease of use. Seventeen of 
the 24 programs reviewed by Carpenter, Deloria, and Morganstein 
(1984) were rated fair or poor in the extent to which they 
allowed recovery from errors, explained an error, and/or 
indicated the source of error. One-third had poor error handling 
capabilities. Such programs will typically just stop operating, 
and the only way to proceed is to start the program all over 
again, which may mean re-entering all of the current data. 

Similarly, in program^ with poor error handling, it may be 
difficult to get out of a data management, data processing, or 
statistical procedure that was erroneously jhosen. In the best 
case, one can simply "back out* of the procedure by pressing the 
Escape key, for example. In the worst case, the proqram will 
leave you no choice but to exit entirely to recover from the 
error. 



Once data are in the program, procedures for maintaining them 
are needed. At a basic level, maintenance includes adding new 
data, correcting erroneous data, and deleting unneeded data. An 
editor built into a program can be especially convenient for 



these purposes. This makes it unnecessary to exit the program 
to enter data, or to make changes in the data, and then re-enter 
the program to continue with data processing and statisical 
analysis* 



Data Processing 

There are a number of related program features to look for 
regarding the processing of data and their statistical 
manipulation. Data processing covers the selection of cases, the 
joining of files, the sorting of files, and the transformation of 
data. Often the analysis of one or more subsets of the total 
number of cases is desired. There are several ways to select 
cases for these purposes. The most powerful way is to use 
Hoolem AND and OR conditions between ranges of variables* For 
example, you may wish to select two discrete groups of students 
for comparison, such as students with math scores from 1 to 3 AND 
reading scores from 1 to 3, OR students with math scores from 8 
to 10 AND reading scores 8 to 10. Other options include 
selecting just a single range of a variable or a set of specific 
values. The procedures currently used are the best guide in 
determining how sophisticated a program's ability to select cases 
needs to be« 



Reviewing current procedures will also help determine which 
of the ether following data processing features are desired* For 
example, the ability to join new cases, or new variables for 
current cases, with an already exist! . file may be important if 
you have a continuously changing data ^ase. Sorting a file is 
sometimes necessary to perform subsequent statistical analyses; 
therefore, depending on the type of analyses that you are 
interested in, this feature may be more or less desirable. 



The transformation of data is also frequently required for 
statistical analyses. Transformations include single variable 
operations such as logical operations, adding a constant, and 
taking logs, as well as mathematical computations involving more 
than one variable. Transformations may be used to create new 
variables and to generate random variables for use in 
simulations. In addition, the weighting of variables is often 
achieved through a transformation process if it can not be 
accomplished directly during statistical analysis. Past 
experience is the best source of information for determining the 
desirablity of these features. 



statistical Procedures 

Programs vary considerably in the types of statistical 
procedures they include and in the variety of statistics 
regarding any one type. Not all programs contain all types of 



procedures, and programs include different specific procedures 
within a type* Figure 1, adapted from the tables in Carpenter r 
Oeloria# and Morganstein (1984)^ shows a sample of the types and 
variety of statistical procedures that might be found in a 
package. According to these authors., the procedures most 
frequently included are summary statistics (especially arithmetic 
means, variance and standard deviation) , simple and multiple 
regression analyses, paired and groups t-tests, and various N-way 
ANOVAs. Very few programs appear to have nonparametr ic 
statistics other than the Spearman Rank-Order Correlation, In 
addition, of the 24 programs they reviewed, few offer time secies 
procedures* Therefore, careful attention to the procedures 
included in a given package is advised.' 

Figure 1 

Sample of Statistical Procedures* 



Descriptive Statistics 
Frequency 
Median 
Percentile 
Minimum/Maximum 
Standard Deviation 
Ranks 



Mean 

Mode 

Range 

Variance 

Skewness 

Kur tosis 



Nonparametr ic Statistics 
Friedman Two-way ANOVA 
Mann-Whitney U 
Kruskal-Wallis H 
Wilcoxon Signed Rank 
Chi-Square Tests 
{e.g., goodness of fit, 
log^iinear models) 



Kendall *s Tau 

Kolomogorov--Sniirnov 

Spearman Rank-Order Correlation 

Wald-Wolfowi z Runs 

Kendall's C efficient of 

Concordance ■ 
Contingemry tables/cross- tabs 



Linear Models 

Regression; simple, 
multiple, polynomial, 
stepwise 
General Linear Model 
Factor Analysis 



T-tests? paired, groups 
ANOVA; N-way, contrasts, 

unequal cell size, random effects 
Discriminant Analysis 



Time Series 
ARIMA 

Cochrane-Orcutt 
Moving Averages 



Two-Stage ordinary Least Squares 
Serial Correlation 
(Autocorrelation Coefficients) 



♦ Adapted from Carpenter, Deloria, and Morganstein (1984) 



Statistical accuracy and the time it takes to compute a given 
solution are issues apart from which statistics are available* 
In its simplest form, statistical accuracy is whether a program 
can calculate the correct answer. The precision of the statistic 
can be a problem on microcomputers because of rounding error due 
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to their limited ability to handle numbers with many digits* In 
addition, inaccuracy may result from a problem with the algorithm 
used by the {Program to calculate the statistic or a problem 
regarding the condition of the data matrix. 

The time it takes to compute a statistic will depend on how 
the data are handled. In some programs all of the data are 
placed in the computer's internal memory (RAM). This greatly 
enhances the rate of processing r since the only limit is the 
internal processing speed of the computer. However, it limits 
the amount of data to that which can fit in RAM. 



By increasing the amount of internal memory, more data can be 
accommodated. External memory on either flexible or hard disks 
can also increase the amount of data that can be accessed, but it 
will limit the speed in ccMapa*- ison to internal memory-based 
programs, since the mechanicax process of going to the disk to 
read data, transferring it into RAM, and then processing it ail 
takes time. 



Internal processing speed varies considerably from machine to 
machine. Certain machines' speed in processing numbers can be 
enhanced by using one of the many packages based on the 
Intel 8087 math co-processor chip. With this chip and related 
software, speed of processing can be increased up to 180 times 
and accuracy increased to 18-digit precision. 

With some programs, machines, and data sets, it is necessary 
to allow several honrs or even an overnight period for processing 
to occur. However, this may not be any longer than is necessary 
to complete a similar task on a heavily used mainframe time-share 
system. And, in a short time, a number of such runs may pay for 
the microcomputer in terms of cost and convenience. The best 
advice is to have a data set typical of your research that can be 
used as a benchmark for judging a program's accuracy and speed. 



Pr intinq 

There are tiroes throughout the prcx:ess of using a statistical 
program that printing is desired. When data are being entered, 
for example, one may wish to get a printed case^-by-case or 
variablc-by-var iable summary in order to verify their accuracy. 
Printed results of computations can be helpful, especially those 
of intermediate processes such as a regression equation, which 
may be used in other computations. And, of course, the results 
of analyses should be printed in a format that is consistent with 
conventions (e.g., contingency table, ANOVA table) and should be 
clearly labeled for easy interpretation. 



Beyond the printing of individual results^ it is otten useful 
to present a graphic picture in the foriu of a scatter plot^ 
histograiUr and so on* These roay then be combined with the 
results of analysis to form a report. The very best programs 
allow control over (1) the placement of titles^ labels^ and other 
textual information (e.g* ^ headings and footnotes), (2) the 
reelection among various types of graphic depictions of results 
(bor, line^ pie, scatter-^plot, histogram, 2-D and and 
control over their scaling, and (3J the storage of completed 
reports for later use. An alternative to having these 
capabilities built into a program is being able to easily 
transfer data to another program that has the specific reporting 
capabilities desired. 



W r itten Information 

Unfortunately, information about data management, data 
processing, statistical procedures, speed, accuracy, and so on, 
are unevenly treated in the written information accompanying most 
programs. This is evident in the fact that the number of pages 
in the documentation of the programs reviewed by Carpenter, 
Deloria, and Morganstein (1984) ranged from 12 to 381. This 
unevenness is true for both the written information about how to 
use most programs and about their technical aspects. The 
documentation for a program can be a major factor in facilitating 
or hampering its use. Good documentation for statistical 
software will at least provide some description of (1) the way 
the software is organised, (2) some basic information about each 
feature of the program, and (3) more detailed information about 
the statistical procedures and when different procedures might be 
selected. 



Welcome additions include several features that can make a 
program easy to use. These might include a set of written 
examples (e.g., sample runs). Both tutorial and "help" features 
regarding how to use each part of the program can also facilitate 
the use of a program. A tutorial is typically a program in 
itself designed to teach the user about a feature by working 
through it step by step at the computer. if a mistake is made, 
feedback is provided that can help a person understand what the 
mistake was and why it causeci a problem. A "help" function is 
not a separate program, but instead is build right into a 
statistical program. It is used when a question arises regarding 
a particular feature. At that point, additional information is 
given on the screen to help one better understand what the 
feature is and how to use it, but no tutorial process is 
involved, just information. 

Clear illustrations of what will be presented on the screen 
at each step in using a program can be a great help both in 
initially learning a program" and in gettin^j the most out ot thone 
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features that are used only occasionally, illustrations or 
examples of the kinds of results one can expect to appear on the 
screen at each step of the progra*n can be very reassuring* 
Similarly, examples shcwinq how to create various printed 
versions of results, and what they will look like, can improve 
the ease with which this often-complex feature is mastered. 

Exemplary documentation will include the particular algorithm 
used in each analysis so that the user can be fully aware of its 
assumptions. The best programs will also document instructions 
for modifying formulas to better meet particular situations* 
Tables of contents, indexes, and an alphabetized summary of 
commands are all features that should be expected in a packaye, 
but which are sometimes not included. 

PROS AND CONS OF STATISTICAL ANALYSIS SOFTWARE 

Most statistical packages have a long way to go before they 
reach the professional appearance and ease of use of word 
pcocessing, data base management, spreadsheet, and other generic 
business software. Some of the most negative features of the 
packages reviewed by Carpenter, Deloria, and Morganstein {1984) 
were: 

• cumbersome nature of handling large data sets and 
in some cases the inability to handle such sets, 

• potential need to buy more than one package to get 
all the statistical procedures desired, 

• poor documentation regarding the number of ca5ies 
and variables allowed in relation to the RAM 
available, 

• typically slow processing speed for larger data 
sets, 

• the need to enter the i;ame information about number 
of cases, number of variables, file names, etc., 
again and again when going through related 
operations, 

• the need to go through all steps in a menu when 
doing the same procedure with different variables, 

• cnanging the meaning of key commands without 
alerting the user, 

• the lack of escape options in case a mistake is 
made either in entering data or selecting a menu 
item, 

9 
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• the tendency of programs to crash when an error was 
made, with only uninterpretabie messages about the 
cause* 
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It is important to remember that most microcomputer-based 
statistical packages have only been on the market a relatively 
short time. Consumer pressure and competition will undoubtedly 
help to increase the quality of programs over the next several 
years. 



On the positive side, the cost savings and convenience of 
using the current crop of statistical packages for small to 
moderate data sets make them an attractive alternative to 
mainframe computer use. To the extent that both input and output 
from a program can be shared with other programs, such as generic 
business packages or mainframe programs, microcomputer-based 
statistical programs offer a substantial enhancement of one's 
computing repertoire. In addition, there are individual programs 
that bring the power and variety of mainframe computer programs 
to the micro, a very attractive possibility. And, as is apparent 
from our software application section, such power and variety can 
offset many of the limitations of poor documentation, aifficult 
use, and slow processing speed. 



The only way to decide which package is the right one for you 
is to think about the features described in this article in 
relation to the research you do, to talk with others who have 
experience doing similar research using particular packages, and 
then to try out a variety of packages to see for yourself which 
features are most important. 



SELECTING THE RIGHT SOFTWARE 



In one sense, hardware requirements are the first 
characteristic of a program that should be considered and, in 
another sense, they are the last. From a realistic point of 
view# the first criteria for selecting a program is whether it 
will run on a machine you already have, or on a machine that you 
feel you can afford to buy for statistical and other purposes* 
However, within these general constraints, hardware becomes a 
secondary consideration, because there is a variety of youd 
programs to choose from for most of the popular and widely used 
machines with operating systems such as Apple DOS 3.3, IBM~PC DOS 
or MS-DOS, and CP/M. Within each group, programs vary in terms 
of their sophistication and cost and in terms of the specitxc 
hardware sys em characteristics that they require. 
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In summary, if a conyaitment has already been roade to purchase 
^ particular iBachine, or if there are special budget limitationSf 
hardware-related requirements are the £irst features of a program 
that should be considered. However, if there are no rigid 
constraints, it is best to ignore these requirements for the time? 
being and move on to the other i^re substantive features of 
statistical analysis programs. 

It is important to evaluate a program in terms of its 
versatility regarding those features you need most. Selection 
may come down to the program{s) with the best ratings on those 
features of greatest importance as opposed to those with he best 
over-all ratings. This notion of the highest ratings on che most 
important features is worth considering. Sometimes pricing, 
especially in regard to multiple copies, is the deciding factor 
among programs of generally equal ratings. In other cases it may 
be that speed, error handling, and versatility (i.e., program 
performance) is more important than either ease of use or 
support. Therefore, lower ratings in these areas would not 
disqualify a program if it was a strong performer. 

Using the infoimation provided in this guide will help you to 
judge the quality of individual programs. The procedures also 
provide a way to compare programs in a consistent manner. 

Any combination of features is possible. Selection should be 
based, therefore, on a consideration of the combination of 
features most desired for the types of tasks to be performed 
using the program. 

In order to make a sound choice: 

1. Describe your use(s) - what will you use the program 
for? 

2^ Identify the features you need - what cJo you want to 
be able to do? 

3. Plan ahead for new needs what are you liKely to 
want a year from now? 

4. Consider constraints - What price range, hardware 
{e.g., machine type, printer features) and user 
preferences are you limited by? 

5. Put features into a rough priority list - which are 
the most, somewhat, and least important features? 
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6. Try out and compasre products - which ones have the 
features you need and want within your constraints? 

?• Remember support - will there be someone you can 
talk to if there are problems after you buy the 
program? 
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